Overview

Dataset statistics

Number of variables34
Number of observations858
Missing cells0
Missing cells (%)0.0%
Duplicate rows23
Duplicate rows (%)2.7%
Total size in memory228.0 KiB
Average record size in memory272.1 B

Variable types

Numeric9
Categorical25

Warnings

STDs:cervical condylomatosis has constant value "0.0" Constant
STDs:AIDS has constant value "0.0" Constant
Dataset has 23 (2.7%) duplicate rows Duplicates
STDs is highly correlated with STDs (number) and 1 other fieldsHigh correlation
STDs (number) is highly correlated with STDsHigh correlation
STDs:condylomatosis is highly correlated with STDs:vulvo-perineal condylomatosisHigh correlation
STDs:vulvo-perineal condylomatosis is highly correlated with STDs:condylomatosisHigh correlation
STDs: Number of diagnosis is highly correlated with STDsHigh correlation
STDs:syphilis is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:Hepatitis B is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
IUD is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Dx:Cancer is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Dx:HPV is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:condylomatosis is highly correlated with STDs:AIDS and 2 other fieldsHigh correlation
STDs:genital herpes is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Schiller is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs: Number of diagnosis is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:AIDS is highly correlated with STDs:syphilis and 23 other fieldsHigh correlation
Hormonal Contraceptives is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:vulvo-perineal condylomatosis is highly correlated with STDs:condylomatosis and 2 other fieldsHigh correlation
Hinselmann is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:molluscum contagiosum is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:vaginal condylomatosis is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Biopsy is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Dx:CIN is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Dx is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:HPV is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:pelvic inflammatory disease is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
STDs:cervical condylomatosis is highly correlated with STDs:syphilis and 23 other fieldsHigh correlation
STDs:HIV is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Citology is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Smokes is highly correlated with STDs:AIDS and 1 other fieldsHigh correlation
Num of pregnancies has 16 (1.9%) zeros Zeros
Smokes (years) has 722 (84.1%) zeros Zeros
Smokes (packs/year) has 722 (84.1%) zeros Zeros
Hormonal Contraceptives (years) has 269 (31.4%) zeros Zeros
IUD (years) has 658 (76.7%) zeros Zeros
STDs (number) has 674 (78.6%) zeros Zeros

Reproduction

Analysis started2021-02-17 06:35:30.156620
Analysis finished2021-02-17 06:36:01.266636
Duration31.11 seconds
Software versionpandas-profiling v2.10.1
Download configurationconfig.yaml

Variables

Age
Real number (ℝ≥0)

Distinct44
Distinct (%)5.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26.82051282
Minimum13
Maximum84
Zeros0
Zeros (%)0.0%
Memory size6.8 KiB

Quantile statistics

Minimum13
5-th percentile16
Q120
median25
Q332
95-th percentile41
Maximum84
Range71
Interquartile range (IQR)12

Descriptive statistics

Standard deviation8.497948065
Coefficient of variation (CV)0.3168451
Kurtosis4.778575148
Mean26.82051282
Median Absolute Deviation (MAD)5.5
Skewness1.394278767
Sum23012
Variance72.21512132
MonotocityNot monotonic
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
2354
 
6.3%
1850
 
5.8%
2146
 
5.4%
2045
 
5.2%
1944
 
5.1%
2439
 
4.5%
2539
 
4.5%
2638
 
4.4%
2837
 
4.3%
3035
 
4.1%
Other values (34)431
50.2%
ValueCountFrequency (%)
131
 
0.1%
145
 
0.6%
1521
2.4%
1623
2.7%
1735
4.1%
ValueCountFrequency (%)
841
0.1%
791
0.1%
702
0.2%
591
0.1%
522
0.2%

Number of sexual partners
Real number (ℝ≥0)

Distinct13
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.527644231
Minimum1
Maximum28
Zeros0
Zeros (%)0.0%
Memory size6.8 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q33
95-th percentile5
Maximum28
Range27
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.642267047
Coefficient of variation (CV)0.6497223885
Kurtosis71.44822647
Mean2.527644231
Median Absolute Deviation (MAD)1
Skewness5.538918395
Sum2168.71875
Variance2.697041053
MonotocityNot monotonic
Histogram with fixed size bins (bins=13)
ValueCountFrequency (%)
2272
31.7%
3208
24.2%
1206
24.0%
478
 
9.1%
544
 
5.1%
2.52764423126
 
3.0%
69
 
1.0%
77
 
0.8%
84
 
0.5%
91
 
0.1%
Other values (3)3
 
0.3%
ValueCountFrequency (%)
1206
24.0%
2272
31.7%
2.52764423126
 
3.0%
3208
24.2%
478
 
9.1%
ValueCountFrequency (%)
281
 
0.1%
151
 
0.1%
101
 
0.1%
91
 
0.1%
84
0.5%

First sexual intercourse
Real number (ℝ≥0)

Distinct22
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16.99529965
Minimum10
Maximum32
Zeros0
Zeros (%)0.0%
Memory size6.8 KiB

Quantile statistics

Minimum10
5-th percentile14
Q115
median17
Q318
95-th percentile22
Maximum32
Range22
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.791882967
Coefficient of variation (CV)0.1642738301
Kurtosis4.348131925
Mean16.99529965
Median Absolute Deviation (MAD)2
Skewness1.570772719
Sum14581.9671
Variance7.7946105
MonotocityNot monotonic
Histogram with fixed size bins (bins=22)
ValueCountFrequency (%)
15163
19.0%
17151
17.6%
18137
16.0%
16121
14.1%
1479
9.2%
1960
 
7.0%
2037
 
4.3%
1325
 
2.9%
2120
 
2.3%
229
 
1.0%
Other values (12)56
 
6.5%
ValueCountFrequency (%)
102
 
0.2%
112
 
0.2%
126
 
0.7%
1325
 
2.9%
1479
9.2%
ValueCountFrequency (%)
321
 
0.1%
295
0.6%
283
0.3%
276
0.7%
267
0.8%

Num of pregnancies
Real number (ℝ≥0)

ZEROS

Distinct12
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.275561097
Minimum0
Maximum11
Zeros16
Zeros (%)1.9%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile1
Q11
median2
Q33
95-th percentile5
Maximum11
Range11
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.399325141
Coefficient of variation (CV)0.6149363083
Kurtosis3.646025962
Mean2.275561097
Median Absolute Deviation (MAD)1
Skewness1.472193912
Sum1952.431421
Variance1.958110849
MonotocityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1270
31.5%
2240
28.0%
3139
16.2%
474
 
8.6%
2.27556109756
 
6.5%
535
 
4.1%
618
 
2.1%
016
 
1.9%
76
 
0.7%
82
 
0.2%
Other values (2)2
 
0.2%
ValueCountFrequency (%)
016
 
1.9%
1270
31.5%
2240
28.0%
2.27556109756
 
6.5%
3139
16.2%
ValueCountFrequency (%)
111
 
0.1%
101
 
0.1%
82
 
0.2%
76
 
0.7%
618
2.1%

Smokes
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
722 
1.0
123 
0.1455621301775148
 
13

Length

Max length18
Median length3
Mean length3.227272727
Min length3

Characters and Unicode

Total characters2769
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row0.0
ValueCountFrequency (%)
0.0722
84.1%
1.0123
 
14.3%
0.145562130177514813
 
1.5%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0722
84.1%
1.0123
 
14.3%
0.145562130177514813
 
1.5%

Most occurring characters

ValueCountFrequency (%)
01593
57.5%
.858
31.0%
1175
 
6.3%
539
 
1.4%
426
 
0.9%
726
 
0.9%
613
 
0.5%
213
 
0.5%
313
 
0.5%
813
 
0.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1911
69.0%
Other Punctuation858
31.0%

Most frequent character per category

ValueCountFrequency (%)
01593
83.4%
1175
 
9.2%
539
 
2.0%
426
 
1.4%
726
 
1.4%
613
 
0.7%
213
 
0.7%
313
 
0.7%
813
 
0.7%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2769
100.0%

Most frequent character per script

ValueCountFrequency (%)
01593
57.5%
.858
31.0%
1175
 
6.3%
539
 
1.4%
426
 
0.9%
726
 
0.9%
613
 
0.5%
213
 
0.5%
313
 
0.5%
813
 
0.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII2769
100.0%

Most frequent character per block

ValueCountFrequency (%)
01593
57.5%
.858
31.0%
1175
 
6.3%
539
 
1.4%
426
 
0.9%
726
 
0.9%
613
 
0.5%
213
 
0.5%
313
 
0.5%
813
 
0.5%

Smokes (years)
Real number (ℝ≥0)

ZEROS

Distinct31
Distinct (%)3.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.219721413
Minimum0
Maximum37
Zeros722
Zeros (%)84.1%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile9.15
Maximum37
Range37
Interquartile range (IQR)0

Descriptive statistics

Standard deviation4.057884877
Coefficient of variation (CV)3.326894843
Kurtosis24.1781232
Mean1.219721413
Median Absolute Deviation (MAD)0
Skewness4.499581313
Sum1046.520972
Variance16.46642967
MonotocityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
0722
84.1%
1.26697290915
 
1.7%
1.21972141313
 
1.5%
59
 
1.0%
99
 
1.0%
18
 
0.9%
37
 
0.8%
27
 
0.8%
76
 
0.7%
166
 
0.7%
Other values (21)56
 
6.5%
ValueCountFrequency (%)
0722
84.1%
0.161
 
0.1%
0.53
 
0.3%
18
 
0.9%
1.21972141313
 
1.5%
ValueCountFrequency (%)
371
0.1%
341
0.1%
321
0.1%
281
0.1%
241
0.1%

Smokes (packs/year)
Real number (ℝ≥0)

ZEROS

Distinct63
Distinct (%)7.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.4531439506
Minimum0
Maximum37
Zeros722
Zeros (%)84.1%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.415
Maximum37
Range37
Interquartile range (IQR)0

Descriptive statistics

Standard deviation2.209657329
Coefficient of variation (CV)4.876281204
Kurtosis116.6421985
Mean0.4531439506
Median Absolute Deviation (MAD)0
Skewness9.379886073
Sum388.7975097
Variance4.882585512
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0722
84.1%
0.513202127718
 
2.1%
0.453143950613
 
1.5%
16
 
0.7%
35
 
0.6%
1.24
 
0.5%
0.054
 
0.5%
24
 
0.5%
0.24
 
0.5%
0.754
 
0.5%
Other values (53)74
 
8.6%
ValueCountFrequency (%)
0722
84.1%
0.0011
 
0.1%
0.0031
 
0.1%
0.0251
 
0.1%
0.042
 
0.2%
ValueCountFrequency (%)
371
0.1%
221
0.1%
211
0.1%
191
0.1%
151
0.1%

Hormonal Contraceptives
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
1.0
481 
0.0
269 
0.6413333333333333
108 

Length

Max length18
Median length3
Mean length4.888111888
Min length3

Characters and Unicode

Total characters4194
Distinct characters6
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row1.0
5th row1.0
ValueCountFrequency (%)
1.0481
56.1%
0.0269
31.4%
0.6413333333333333108
 
12.6%
Histogram of lengths of the category
ValueCountFrequency (%)
1.0481
56.1%
0.0269
31.4%
0.6413333333333333108
 
12.6%

Most occurring characters

ValueCountFrequency (%)
31404
33.5%
01127
26.9%
.858
20.5%
1589
14.0%
6108
 
2.6%
4108
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3336
79.5%
Other Punctuation858
 
20.5%

Most frequent character per category

ValueCountFrequency (%)
31404
42.1%
01127
33.8%
1589
17.7%
6108
 
3.2%
4108
 
3.2%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4194
100.0%

Most frequent character per script

ValueCountFrequency (%)
31404
33.5%
01127
26.9%
.858
20.5%
1589
14.0%
6108
 
2.6%
4108
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4194
100.0%

Most frequent character per block

ValueCountFrequency (%)
31404
33.5%
01127
26.9%
.858
20.5%
1589
14.0%
6108
 
2.6%
4108
 
2.6%

Hormonal Contraceptives (years)
Real number (ℝ≥0)

ZEROS

Distinct41
Distinct (%)4.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.256419201
Minimum0
Maximum30
Zeros269
Zeros (%)31.4%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median1
Q32.256419201
95-th percentile9
Maximum30
Range30
Interquartile range (IQR)2.256419201

Descriptive statistics

Standard deviation3.519081817
Coefficient of variation (CV)1.559586896
Kurtosis10.76928492
Mean2.256419201
Median Absolute Deviation (MAD)1
Skewness2.808474354
Sum1936.007675
Variance12.38393684
MonotocityNot monotonic
Histogram with fixed size bins (bins=41)
ValueCountFrequency (%)
0269
31.4%
2.256419201108
12.6%
177
 
9.0%
0.2541
 
4.8%
240
 
4.7%
339
 
4.5%
534
 
4.0%
0.525
 
2.9%
0.0825
 
2.9%
624
 
2.8%
Other values (31)176
20.5%
ValueCountFrequency (%)
0269
31.4%
0.0825
 
2.9%
0.1616
 
1.9%
0.171
 
0.1%
0.2541
 
4.8%
ValueCountFrequency (%)
301
 
0.1%
221
 
0.1%
204
0.5%
192
0.2%
171
 
0.1%

IUD
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
658 
0.11201079622132254
117 
1.0
83 

Length

Max length19
Median length3
Mean length5.181818182
Min length3

Characters and Unicode

Total characters4446
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0658
76.7%
0.11201079622132254117
 
13.6%
1.083
 
9.7%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0658
76.7%
0.11201079622132254117
 
13.6%
1.083
 
9.7%

Most occurring characters

ValueCountFrequency (%)
01750
39.4%
.858
19.3%
2585
 
13.2%
1551
 
12.4%
7117
 
2.6%
9117
 
2.6%
6117
 
2.6%
3117
 
2.6%
5117
 
2.6%
4117
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3588
80.7%
Other Punctuation858
 
19.3%

Most frequent character per category

ValueCountFrequency (%)
01750
48.8%
2585
 
16.3%
1551
 
15.4%
7117
 
3.3%
9117
 
3.3%
6117
 
3.3%
3117
 
3.3%
5117
 
3.3%
4117
 
3.3%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4446
100.0%

Most frequent character per script

ValueCountFrequency (%)
01750
39.4%
.858
19.3%
2585
 
13.2%
1551
 
12.4%
7117
 
2.6%
9117
 
2.6%
6117
 
2.6%
3117
 
2.6%
5117
 
2.6%
4117
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII4446
100.0%

Most frequent character per block

ValueCountFrequency (%)
01750
39.4%
.858
19.3%
2585
 
13.2%
1551
 
12.4%
7117
 
2.6%
9117
 
2.6%
6117
 
2.6%
3117
 
2.6%
5117
 
2.6%
4117
 
2.6%

IUD (years)
Real number (ℝ≥0)

ZEROS

Distinct27
Distinct (%)3.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.5148043185
Minimum0
Maximum19
Zeros658
Zeros (%)76.7%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile3
Maximum19
Range19
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.805585427
Coefficient of variation (CV)3.507323778
Kurtosis35.17115157
Mean0.5148043185
Median Absolute Deviation (MAD)0
Skewness5.380678276
Sum441.7021053
Variance3.260138736
MonotocityNot monotonic
Histogram with fixed size bins (bins=27)
ValueCountFrequency (%)
0658
76.7%
0.5148043185117
 
13.6%
311
 
1.3%
210
 
1.2%
59
 
1.0%
18
 
0.9%
77
 
0.8%
87
 
0.8%
45
 
0.6%
65
 
0.6%
Other values (17)21
 
2.4%
ValueCountFrequency (%)
0658
76.7%
0.082
 
0.2%
0.161
 
0.1%
0.171
 
0.1%
0.251
 
0.1%
ValueCountFrequency (%)
191
 
0.1%
171
 
0.1%
151
 
0.1%
121
 
0.1%
113
0.3%

STDs
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
674 
0.1049136786188579
105 
1.0
79 

Length

Max length18
Median length3
Mean length4.835664336
Min length3

Characters and Unicode

Total characters4149
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0674
78.6%
0.1049136786188579105
 
12.2%
1.079
 
9.2%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0674
78.6%
0.1049136786188579105
 
12.2%
1.079
 
9.2%

Most occurring characters

ValueCountFrequency (%)
01637
39.5%
.858
20.7%
1394
 
9.5%
8315
 
7.6%
9210
 
5.1%
6210
 
5.1%
7210
 
5.1%
4105
 
2.5%
3105
 
2.5%
5105
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3291
79.3%
Other Punctuation858
 
20.7%

Most frequent character per category

ValueCountFrequency (%)
01637
49.7%
1394
 
12.0%
8315
 
9.6%
9210
 
6.4%
6210
 
6.4%
7210
 
6.4%
4105
 
3.2%
3105
 
3.2%
5105
 
3.2%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4149
100.0%

Most frequent character per script

ValueCountFrequency (%)
01637
39.5%
.858
20.7%
1394
 
9.5%
8315
 
7.6%
9210
 
5.1%
6210
 
5.1%
7210
 
5.1%
4105
 
2.5%
3105
 
2.5%
5105
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4149
100.0%

Most frequent character per block

ValueCountFrequency (%)
01637
39.5%
.858
20.7%
1394
 
9.5%
8315
 
7.6%
9210
 
5.1%
6210
 
5.1%
7210
 
5.1%
4105
 
2.5%
3105
 
2.5%
5105
 
2.5%

STDs (number)
Real number (ℝ≥0)

HIGH CORRELATION
ZEROS

Distinct6
Distinct (%)0.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.176626826
Minimum0
Maximum4
Zeros674
Zeros (%)78.6%
Memory size6.8 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2
Maximum4
Range4
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.5264404944
Coefficient of variation (CV)2.980524002
Kurtosis13.56993065
Mean0.176626826
Median Absolute Deviation (MAD)0
Skewness3.631471678
Sum151.5458167
Variance0.2771395941
MonotocityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0674
78.6%
0.176626826105
 
12.2%
237
 
4.3%
134
 
4.0%
37
 
0.8%
41
 
0.1%
ValueCountFrequency (%)
0674
78.6%
0.176626826105
 
12.2%
134
 
4.0%
237
 
4.3%
37
 
0.8%
ValueCountFrequency (%)
41
 
0.1%
37
 
0.8%
237
 
4.3%
134
 
4.0%
0.176626826105
12.2%

STDs:condylomatosis
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
709 
0.05843293492695883
105 
1.0
 
44

Length

Max length19
Median length3
Mean length4.958041958
Min length3

Characters and Unicode

Total characters4254
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0709
82.6%
0.05843293492695883105
 
12.2%
1.044
 
5.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0709
82.6%
0.05843293492695883105
 
12.2%
1.044
 
5.1%

Most occurring characters

ValueCountFrequency (%)
01672
39.3%
.858
20.2%
8315
 
7.4%
3315
 
7.4%
9315
 
7.4%
5210
 
4.9%
4210
 
4.9%
2210
 
4.9%
6105
 
2.5%
144
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3396
79.8%
Other Punctuation858
 
20.2%

Most frequent character per category

ValueCountFrequency (%)
01672
49.2%
8315
 
9.3%
3315
 
9.3%
9315
 
9.3%
5210
 
6.2%
4210
 
6.2%
2210
 
6.2%
6105
 
3.1%
144
 
1.3%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4254
100.0%

Most frequent character per script

ValueCountFrequency (%)
01672
39.3%
.858
20.2%
8315
 
7.4%
3315
 
7.4%
9315
 
7.4%
5210
 
4.9%
4210
 
4.9%
2210
 
4.9%
6105
 
2.5%
144
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII4254
100.0%

Most frequent character per block

ValueCountFrequency (%)
01672
39.3%
.858
20.2%
8315
 
7.4%
3315
 
7.4%
9315
 
7.4%
5210
 
4.9%
4210
 
4.9%
2210
 
4.9%
6105
 
2.5%
144
 
1.0%

STDs:cervical condylomatosis
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
858 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2574
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0858
100.0%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0858
100.0%

Most occurring characters

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1716
66.7%
Other Punctuation858
33.3%

Most frequent character per category

ValueCountFrequency (%)
01716
100.0%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2574
100.0%

Most frequent character per script

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2574
100.0%

Most frequent character per block

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

STDs:vaginal condylomatosis
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
749 
0.005312084993359893
105 
1.0
 
4

Length

Max length20
Median length3
Mean length5.08041958
Min length3

Characters and Unicode

Total characters4359
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0749
87.3%
0.005312084993359893105
 
12.2%
1.04
 
0.5%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0749
87.3%
0.005312084993359893105
 
12.2%
1.04
 
0.5%

Most occurring characters

ValueCountFrequency (%)
01922
44.1%
.858
19.7%
3420
 
9.6%
9420
 
9.6%
5210
 
4.8%
8210
 
4.8%
1109
 
2.5%
2105
 
2.4%
4105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3501
80.3%
Other Punctuation858
 
19.7%

Most frequent character per category

ValueCountFrequency (%)
01922
54.9%
3420
 
12.0%
9420
 
12.0%
5210
 
6.0%
8210
 
6.0%
1109
 
3.1%
2105
 
3.0%
4105
 
3.0%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4359
100.0%

Most frequent character per script

ValueCountFrequency (%)
01922
44.1%
.858
19.7%
3420
 
9.6%
9420
 
9.6%
5210
 
4.8%
8210
 
4.8%
1109
 
2.5%
2105
 
2.4%
4105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4359
100.0%

Most frequent character per block

ValueCountFrequency (%)
01922
44.1%
.858
19.7%
3420
 
9.6%
9420
 
9.6%
5210
 
4.8%
8210
 
4.8%
1109
 
2.5%
2105
 
2.4%
4105
 
2.4%

STDs:vulvo-perineal condylomatosis
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
710 
0.057104913678618856
105 
1.0
 
43

Length

Max length20
Median length3
Mean length5.08041958
Min length3

Characters and Unicode

Total characters4359
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0710
82.8%
0.057104913678618856105
 
12.2%
1.043
 
5.0%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0710
82.8%
0.057104913678618856105
 
12.2%
1.043
 
5.0%

Most occurring characters

ValueCountFrequency (%)
01778
40.8%
.858
19.7%
1358
 
8.2%
6315
 
7.2%
8315
 
7.2%
5210
 
4.8%
7210
 
4.8%
4105
 
2.4%
9105
 
2.4%
3105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3501
80.3%
Other Punctuation858
 
19.7%

Most frequent character per category

ValueCountFrequency (%)
01778
50.8%
1358
 
10.2%
6315
 
9.0%
8315
 
9.0%
5210
 
6.0%
7210
 
6.0%
4105
 
3.0%
9105
 
3.0%
3105
 
3.0%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4359
100.0%

Most frequent character per script

ValueCountFrequency (%)
01778
40.8%
.858
19.7%
1358
 
8.2%
6315
 
7.2%
8315
 
7.2%
5210
 
4.8%
7210
 
4.8%
4105
 
2.4%
9105
 
2.4%
3105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4359
100.0%

Most frequent character per block

ValueCountFrequency (%)
01778
40.8%
.858
19.7%
1358
 
8.2%
6315
 
7.2%
8315
 
7.2%
5210
 
4.8%
7210
 
4.8%
4105
 
2.4%
9105
 
2.4%
3105
 
2.4%

STDs:syphilis
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
735 
0.02390438247011952
105 
1.0
 
18

Length

Max length19
Median length3
Mean length4.958041958
Min length3

Characters and Unicode

Total characters4254
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0735
85.7%
0.02390438247011952105
 
12.2%
1.018
 
2.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0735
85.7%
0.02390438247011952105
 
12.2%
1.018
 
2.1%

Most occurring characters

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3396
79.8%
Other Punctuation858
 
20.2%

Most frequent character per category

ValueCountFrequency (%)
01908
56.2%
2315
 
9.3%
1228
 
6.7%
3210
 
6.2%
9210
 
6.2%
4210
 
6.2%
8105
 
3.1%
7105
 
3.1%
5105
 
3.1%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4254
100.0%

Most frequent character per script

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4254
100.0%

Most frequent character per block

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

STDs:pelvic inflammatory disease
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
752 
0.0013280212483399733
105 
1.0
 
1

Length

Max length21
Median length3
Mean length5.202797203
Min length3

Characters and Unicode

Total characters4464
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3606
80.8%
Other Punctuation858
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
01925
53.4%
3525
 
14.6%
2315
 
8.7%
1211
 
5.9%
8210
 
5.8%
9210
 
5.8%
4105
 
2.9%
7105
 
2.9%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

STDs:genital herpes
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
752 
0.0013280212483399733
105 
1.0
 
1

Length

Max length21
Median length3
Mean length5.202797203
Min length3

Characters and Unicode

Total characters4464
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3606
80.8%
Other Punctuation858
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
01925
53.4%
3525
 
14.6%
2315
 
8.7%
1211
 
5.9%
8210
 
5.8%
9210
 
5.8%
4105
 
2.9%
7105
 
2.9%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

STDs:molluscum contagiosum
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
752 
0.0013280212483399733
105 
1.0
 
1

Length

Max length21
Median length3
Mean length5.202797203
Min length3

Characters and Unicode

Total characters4464
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3606
80.8%
Other Punctuation858
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
01925
53.4%
3525
 
14.6%
2315
 
8.7%
1211
 
5.9%
8210
 
5.8%
9210
 
5.8%
4105
 
2.9%
7105
 
2.9%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

STDs:AIDS
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
858 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters2574
Distinct characters2
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0858
100.0%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0858
100.0%

Most occurring characters

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1716
66.7%
Other Punctuation858
33.3%

Most frequent character per category

ValueCountFrequency (%)
01716
100.0%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common2574
100.0%

Most frequent character per script

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII2574
100.0%

Most frequent character per block

ValueCountFrequency (%)
01716
66.7%
.858
33.3%

STDs:HIV
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
735 
0.02390438247011952
105 
1.0
 
18

Length

Max length19
Median length3
Mean length4.958041958
Min length3

Characters and Unicode

Total characters4254
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0735
85.7%
0.02390438247011952105
 
12.2%
1.018
 
2.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0735
85.7%
0.02390438247011952105
 
12.2%
1.018
 
2.1%

Most occurring characters

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3396
79.8%
Other Punctuation858
 
20.2%

Most frequent character per category

ValueCountFrequency (%)
01908
56.2%
2315
 
9.3%
1228
 
6.7%
3210
 
6.2%
9210
 
6.2%
4210
 
6.2%
8105
 
3.1%
7105
 
3.1%
5105
 
3.1%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4254
100.0%

Most frequent character per script

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII4254
100.0%

Most frequent character per block

ValueCountFrequency (%)
01908
44.9%
.858
20.2%
2315
 
7.4%
1228
 
5.4%
3210
 
4.9%
9210
 
4.9%
4210
 
4.9%
8105
 
2.5%
7105
 
2.5%
5105
 
2.5%

STDs:Hepatitis B
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
752 
0.0013280212483399733
105 
1.0
 
1

Length

Max length21
Median length3
Mean length5.202797203
Min length3

Characters and Unicode

Total characters4464
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0752
87.6%
0.0013280212483399733105
 
12.2%
1.01
 
0.1%

Most occurring characters

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3606
80.8%
Other Punctuation858
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
01925
53.4%
3525
 
14.6%
2315
 
8.7%
1211
 
5.9%
8210
 
5.8%
9210
 
5.8%
4105
 
2.9%
7105
 
2.9%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII4464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01925
43.1%
.858
19.2%
3525
 
11.8%
2315
 
7.1%
1211
 
4.7%
8210
 
4.7%
9210
 
4.7%
4105
 
2.4%
7105
 
2.4%

STDs:HPV
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0.0
751 
0.0026560424966799467
105 
1.0
 
2

Length

Max length21
Median length3
Mean length5.202797203
Min length3

Characters and Unicode

Total characters4464
Distinct characters9
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0.0
2nd row0.0
3rd row0.0
4th row0.0
5th row0.0
ValueCountFrequency (%)
0.0751
87.5%
0.0026560424966799467105
 
12.2%
1.02
 
0.2%
Histogram of lengths of the category
ValueCountFrequency (%)
0.0751
87.5%
0.0026560424966799467105
 
12.2%
1.02
 
0.2%

Most occurring characters

ValueCountFrequency (%)
01924
43.1%
.858
19.2%
6525
 
11.8%
4315
 
7.1%
9315
 
7.1%
2210
 
4.7%
7210
 
4.7%
5105
 
2.4%
12
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number3606
80.8%
Other Punctuation858
 
19.2%

Most frequent character per category

ValueCountFrequency (%)
01924
53.4%
6525
 
14.6%
4315
 
8.7%
9315
 
8.7%
2210
 
5.8%
7210
 
5.8%
5105
 
2.9%
12
 
0.1%
ValueCountFrequency (%)
.858
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common4464
100.0%

Most frequent character per script

ValueCountFrequency (%)
01924
43.1%
.858
19.2%
6525
 
11.8%
4315
 
7.1%
9315
 
7.1%
2210
 
4.7%
7210
 
4.7%
5105
 
2.4%
12
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII4464
100.0%

Most frequent character per block

ValueCountFrequency (%)
01924
43.1%
.858
19.2%
6525
 
11.8%
4315
 
7.1%
9315
 
7.1%
2210
 
4.7%
7210
 
4.7%
5105
 
2.4%
12
 
< 0.1%

STDs: Number of diagnosis
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)0.5%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
787 
1
 
68
2
 
2
3
 
1

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)0.1%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%

Most occurring characters

ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0787
91.7%
168
 
7.9%
22
 
0.2%
31
 
0.1%

Dx:Cancer
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
840 
1
 
18

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0
ValueCountFrequency (%)
0840
97.9%
118
 
2.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring characters

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Dx:CIN
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
849 
1
 
9

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0849
99.0%
19
 
1.0%
Histogram of lengths of the category
ValueCountFrequency (%)
0849
99.0%
19
 
1.0%

Most occurring characters

ValueCountFrequency (%)
0849
99.0%
19
 
1.0%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0849
99.0%
19
 
1.0%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0849
99.0%
19
 
1.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0849
99.0%
19
 
1.0%

Dx:HPV
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
840 
1
 
18

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row1
5th row0
ValueCountFrequency (%)
0840
97.9%
118
 
2.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring characters

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0840
97.9%
118
 
2.1%

Dx
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
834 
1
 
24

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0834
97.2%
124
 
2.8%
Histogram of lengths of the category
ValueCountFrequency (%)
0834
97.2%
124
 
2.8%

Most occurring characters

ValueCountFrequency (%)
0834
97.2%
124
 
2.8%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0834
97.2%
124
 
2.8%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0834
97.2%
124
 
2.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0834
97.2%
124
 
2.8%

Hinselmann
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
823 
1
 
35

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0823
95.9%
135
 
4.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0823
95.9%
135
 
4.1%

Most occurring characters

ValueCountFrequency (%)
0823
95.9%
135
 
4.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0823
95.9%
135
 
4.1%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0823
95.9%
135
 
4.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0823
95.9%
135
 
4.1%

Schiller
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
784 
1
 
74

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0784
91.4%
174
 
8.6%
Histogram of lengths of the category
ValueCountFrequency (%)
0784
91.4%
174
 
8.6%

Most occurring characters

ValueCountFrequency (%)
0784
91.4%
174
 
8.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0784
91.4%
174
 
8.6%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0784
91.4%
174
 
8.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0784
91.4%
174
 
8.6%

Citology
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
814 
1
 
44

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0814
94.9%
144
 
5.1%
Histogram of lengths of the category
ValueCountFrequency (%)
0814
94.9%
144
 
5.1%

Most occurring characters

ValueCountFrequency (%)
0814
94.9%
144
 
5.1%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0814
94.9%
144
 
5.1%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0814
94.9%
144
 
5.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0814
94.9%
144
 
5.1%

Biopsy
Categorical

HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.8 KiB
0
803 
1
 
55

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters858
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row0
2nd row0
3rd row0
4th row0
5th row0
ValueCountFrequency (%)
0803
93.6%
155
 
6.4%
Histogram of lengths of the category
ValueCountFrequency (%)
0803
93.6%
155
 
6.4%

Most occurring characters

ValueCountFrequency (%)
0803
93.6%
155
 
6.4%

Most occurring categories

ValueCountFrequency (%)
Decimal Number858
100.0%

Most frequent character per category

ValueCountFrequency (%)
0803
93.6%
155
 
6.4%

Most occurring scripts

ValueCountFrequency (%)
Common858
100.0%

Most frequent character per script

ValueCountFrequency (%)
0803
93.6%
155
 
6.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII858
100.0%

Most frequent character per block

ValueCountFrequency (%)
0803
93.6%
155
 
6.4%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

AgeNumber of sexual partnersFirst sexual intercourseNum of pregnanciesSmokesSmokes (years)Smokes (packs/year)Hormonal ContraceptivesHormonal Contraceptives (years)IUDIUD (years)STDsSTDs (number)STDs:condylomatosisSTDs:cervical condylomatosisSTDs:vaginal condylomatosisSTDs:vulvo-perineal condylomatosisSTDs:syphilisSTDs:pelvic inflammatory diseaseSTDs:genital herpesSTDs:molluscum contagiosumSTDs:AIDSSTDs:HIVSTDs:Hepatitis BSTDs:HPVSTDs: Number of diagnosisDx:CancerDx:CINDx:HPVDxHinselmannSchillerCitologyBiopsy
0184.015.00001.0000000.00.0000000.00.00.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
1151.014.00001.0000000.00.0000000.00.00.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
2341.016.99531.0000000.00.0000000.00.00.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
3525.016.00004.0000001.037.00000037.01.03.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0010100000
4463.021.00004.0000000.00.0000000.01.015.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
5423.023.00002.0000000.00.0000000.00.00.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
6513.017.00006.0000001.034.0000003.40.00.01.0000007.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000001101
7261.026.00003.0000000.00.0000000.01.02.01.0000007.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
8451.020.00005.0000000.00.0000000.00.00.00.0000000.0000000.00.00.00.00.00.00.00.00.00.00.00.00.00.0010110000
9443.015.00002.2755611.01.2669732.80.00.00.1120110.5148040.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000

Last rows

AgeNumber of sexual partnersFirst sexual intercourseNum of pregnanciesSmokesSmokes (years)Smokes (packs/year)Hormonal ContraceptivesHormonal Contraceptives (years)IUDIUD (years)STDsSTDs (number)STDs:condylomatosisSTDs:cervical condylomatosisSTDs:vaginal condylomatosisSTDs:vulvo-perineal condylomatosisSTDs:syphilisSTDs:pelvic inflammatory diseaseSTDs:genital herpesSTDs:molluscum contagiosumSTDs:AIDSSTDs:HIVSTDs:Hepatitis BSTDs:HPVSTDs: Number of diagnosisDx:CancerDx:CINDx:HPVDxHinselmannSchillerCitologyBiopsy
848313.018.01.00.00.00.001.00.500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
849323.018.01.01.011.00.161.06.000.00.01.01.00.00.00.00.00.00.00.00.00.00.00.01.0010100000
850191.014.00.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
851232.015.02.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
852433.017.03.00.00.00.001.05.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
853343.018.00.00.00.00.000.00.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
854322.019.01.00.00.00.001.08.000.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
855252.017.00.00.00.00.001.00.080.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000010
856332.024.02.00.00.00.001.00.080.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000
857292.020.01.00.00.00.001.00.500.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.0000000000

Duplicate rows

Most frequent

AgeNumber of sexual partnersFirst sexual intercourseNum of pregnanciesSmokesSmokes (years)Smokes (packs/year)Hormonal ContraceptivesHormonal Contraceptives (years)IUDIUD (years)STDsSTDs (number)STDs:condylomatosisSTDs:cervical condylomatosisSTDs:vaginal condylomatosisSTDs:vulvo-perineal condylomatosisSTDs:syphilisSTDs:pelvic inflammatory diseaseSTDs:genital herpesSTDs:molluscum contagiosumSTDs:AIDSSTDs:HIVSTDs:Hepatitis BSTDs:HPVSTDs: Number of diagnosisDx:CancerDx:CINDx:HPVDxHinselmannSchillerCitologyBiopsycount
0151.014.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000004
7172.015.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000003
1151.015.01.00.00.00.00.6413332.2564190.1120110.5148040.1049140.1766270.0584330.00.0053120.0571050.0239040.0013280.0013280.0013280.00.0239040.0013280.0026560000000002
2152.014.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002
3161.014.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002
4161.015.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002
5171.016.01.00.00.00.00.6413332.2564190.1120110.5148040.1049140.1766270.0584330.00.0053120.0571050.0239040.0013280.0013280.0013280.00.0239040.0013280.0026560000000002
6171.017.01.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002
8172.015.01.00.00.00.01.0000000.3300000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002
9181.014.02.00.00.00.00.0000000.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000.0000000.0000000.0000000.00.0000000.0000000.0000000000000002